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CLAIMS 

We claim: 

1 1 . A method of generating morphemes from received speech, the method 

2 comprising: 

3 selecting candidate sub-morphemes from the received speech; 

4 selecting salient sub-morphemes from the candidate sub-morphemes based on 

5 salience measurements; and 

6 clustering the salient sub-morphemes based on similar characteristics into 

7 morphemes. 

1 2. The method of claim 1 , wherein the generated morphemes are one of acoustic 

2 and non-acoustic morphemes. 

1 3. The method of claim 1 , wherein the similar characteristics used to cluster the 

2 salient sub-morphemes are semantic and syntactic similarities. 

1 4. The method of claim 1 , wherein the generated morphemes are used by a speech 

2 recognition and understanding system. 

1 5. The method of claim 1, wherein the received speech is training speech. 

1 6. The method of claim 5, wherein the step of selecting candidate sub-morphemes 

2 further comprises: 

3 filtering the training speech; 

4 selecting all observed phone sequences of a predetermined length; and 

5 selecting as candidate sub-morphemes the phone sequences that are of at least 

6 the predetermined length. 
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1 7. The method of claim 4, wherein the training speech comprises at least one of verbal 

2 and non-verbal speech. 

1 8. The method of claim 7, wherein the non-verbal speech comprises the use of at least 

2 one of gestures, body movements, head movements, non-responses, text, keyboard entries, 

3 keypad entries, mouse clicks, DTMP codes, pointers, stylus, cable set-top box entries, 

4 graphical user interface entries and touchscreen entries. 

1 9. The method of claim 1, wherein the speech includes multimodal forms. 

1 10. The method of claim 1, wherein the speech is one of transcribed and untranscribed. 

1 11. The method of claim 1, wherein the salient sub-morphemes are selected using a test 

2 for significance. 

1 12. The method of claim 1, wherein the salient sub-morphemes are clustered into 

2 morphemes using a distortion measure between the salient sub-morphemes. 

1 13. A computer-readable medium storing a database of morphemes generated from 

2 received speech, the database generated according to a method comprising: 

3 selecting candidate sub-morphemes from the received speech; 

4 selecting salient sub-morphemes from the candidate sub-morphemes based on salience 

5 measurements; and 

6 clustering the salient sub-morphemes based on similar characteristics into morphemes. 

1 14. The computer-readable medium of claim 13, wherein the generated morphemes are 

2 one of acoustic and non-acoustic morphemes. 

1 15. The computer-readable medium of claim 13, wherein the similar characteristics 

2 used to cluster the salient sub-morphemes are semantic and syntactic similarities. 
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1 16. The computer- readable medium of claim 13, wherein the generated morphemes 

2 are used by a speech recognition and understanding system. 

1 17. A natural spoken language system having a speech recognition and speech 

2 understanding modules, the natural language system using morphemes generated by a 

3 method comprising: 

4 selecting candidate sub-morphemes from received speech; 

5 selecting salient sub-morphemes from the candidate sub-morphemes based on 

6 salience measurements; and 

7 clustering the salient sub-morphemes based on similar characteristics into 

8 morphemes. 

1 18. The natural language system of claim 17, wherein the morphemes are one of 

2 acoustic and non-acoustic morphemes. 

1 19. The natural language system of claim 17, wherein the received speech is training 

2 speech that includes at least one multimodal component. 

1 20. The natural language system of claim 19, wherein the at least one multimodel 

2 component comprises one of gestures, body movements, head movements, non- 

3 responses, text, keyboard entries, keypad entries, mouse clicks, DTMF codes, pointers, 

4 stylus, cable set-top box entries, graphical user interface entries and touchscreen entries. 
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